Speechdat multilingual speech databases for teleservices: across the finish line
نویسندگان
چکیده
The goal of the SpeechDat project is to develop spoken language resources for speech recognisers suited to realise voice driven teleservices. SpeechDat created speech databases for all official languages of the European Union and some major dialectal varieties and minority languages. The size of the databases ranges between 500 and 5000 speakers. In total 20 databases are recorded over the fixed telephone network, 5 databases over the cellular network, and 3 databases are designed for speaker verification. To date the project has successfully reached its end. This paper briefly describes the project, addresses the validation of the databases, their availability to consortium members and third parties, publicity and awareness, and the spin-off of the project in speech recognition research.
منابع مشابه
SpeechDat Experiences in Creating Large Multilingual Speech Databases for Teleservices
In this article experiences in creating large multilingual speech databases for teleservices within a large consortium are reported in order to inspire, to facilitate or to compare the set-up and progress of other enterprises for collecting large speech databases. The focus will be on following aspects: Objectives, benefits, and strategy; project organization; database contents and creation; va...
متن کاملSpeechdat-car: Speech Databases for Voice Driven Teleservices and Control of In-car Applications
The SpeechDat-Car project included in the 4 framework of the European Community's Language Engineering Programme, started in April 1998 with a duration of 30 months. It is a common initiative of car manufacturers, telephone communications operators, companies active in voice operated services and Universities that aims at collecting a set of speech databases in nine different languages to suppo...
متن کاملSpeechdat-e: five eastern european speech databases for voice-operated teleservices completed
In the Speechdat-E project five medium large telephone speech databases have been collected for Czech, Hungarian, Polish, Russian, and Slovak. The project was recently concluded. This paper reports briefly on the contents of the databases, elaborates on experiences gained from the data recordings and from the validation of the databases. The availability of the databases to the public is addres...
متن کاملTwo Swedish Speechdat databases - some experiences and results
The objective of the EU-funded SpeechDat project was to create large-scale speech databases for voice-driven teleservices. This paper deals with the design of two such Swedish resources: 5000 speakers over the fixed telephone network, and 1000 over the mobile network. It also reports on experiences from speaker recruitment and presents statistics on speaker distribution. Results regarding ortho...
متن کاملSpeechDat Cymru: A Large-scale Welsh Telephony Database
We describe the collection of SpeechDat Cymru, a 2000-speaker speech recognition database for the Welsh language, recorded over the public switched telephone network (PSTN). It is collected as part of SpeechDat(II), an ELRA project which deals with the creation of databases in over 20 different European languages and dialects. Design issues common to all SpeechDat(II) databases are discussed, i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999